feat/Broker status - version/image extraction improvements#243
Conversation
|
How did you find that slow function (updateStatusWithDockerImageAndVersion)? |
bc10845 to
fbb56a9
Compare
operator was logging status updates for each broker every 5-15s |
| app: kafka-operator | ||
| component: operator |
There was a problem hiding this comment.
| app: kafka-operator | |
| component: operator |
| app: kafka-operator | ||
| component: operator |
There was a problem hiding this comment.
| app: kafka-operator | |
| component: operator |
| r.KafkaCluster.Spec.GetClusterImage(), r.KafkaCluster.Spec.HeadlessServiceEnabled) | ||
| if err != nil { | ||
| return err | ||
| func (r *Reconciler) updateStatusWithDockerImageAndVersion(brokers map[int32]*banzaiv1beta1.BrokerConfig, log logr.Logger) error { |
There was a problem hiding this comment.
Goroutines have no cancellation path. If the controller shuts down mid-reconcile, these goroutines run to completion (or until JMX times out). Worth passing a context.Context from the reconcile call if JMX extraction can be long-running.
| result := <-ch | ||
| if result.err != nil { | ||
| return result.err | ||
| } |
There was a problem hiding this comment.
Drain the hole channel first then update the broker status so all or none are updated.
Co-authored-by: Razvan Dobre <dobre@adobe.com>
Co-authored-by: Razvan Dobre <dobre@adobe.com>
fbbb905 to
a81b695
Compare
a81b695 to
7cac66e
Compare
|
@azun - what do you think of adding condition on when a broker's kafka version needs to be rertrieved? I was thinking that the Kafka version doesn't change often and there are conditions we can implement to have it fetch it. This will reduce the need to retrieve on every reconcile. Please check out this PR: #249 Overall, the PR looks good and will reduce the time to reconcile. Nice work. |
|
Code Review — feature/reconcile-improvements ▎ Auto-generated pre-flight review · 5 files · 6 commits 🔴 HIGH — Label removal may break alertmanager-peerauthentication charts/kafka-operator/templates/operator-deployment-with-webhook.yaml The removed component: alertmanager label is used as a matchLabels selector in alertmanager-peerauthentication.yaml. After upgrade, the PeerAuthentication Suggested fix: Add a NOTES.txt upgrade warning, or verify no active selectors depend on these labels before removing them. 🟡 MEDIUM — PodMonitor bypasses kube-rbac-proxy when both features are enabled charts/kafka-operator/templates/podmonitor.yaml:29 The PodMonitor targets port: metrics (pod port 8080, plain HTTP). When authProxy.enabled: true (the default), this bypasses the RBAC proxy entirely — any Suggested fix: Add a Helm fail guard when both are enabled, or target port https (8443) with TLS/bearer-token config. 🟡 MEDIUM — PodMonitor missing outer prometheusMetrics.enabled guard charts/kafka-operator/templates/podmonitor.yaml:1 Every other Prometheus template in this chart wraps its block with {{- if .Values.prometheusMetrics.enabled }}. The new PodMonitor only checks Suggested fix: 🔵 LOW — http.Client allocated per JMX call defeats connection pooling pkg/jmxextractor/extractor.go:94 &http.Client{Timeout: 30 * time.Second} is constructed on every call. With the new concurrent goroutine-per-broker model, this spawns N independent clients Suggested fix: Promote the client to a field on jmxExtractor, initialized once in NewJMXExtractor. ✅ Concurrency refactor looks correct The goroutine-per-broker fan-out in updateStatusWithDockerImageAndVersion is sound: the channel is buffered to len(brokers) and each goroutine sends exactly |
Description
Performance improvements: parallel scrape for broker version and image.
Type of Change
Checklist